An Empirical Investigation of Discounting in Cross-Domain Language Models

نویسندگان

  • Greg Durrett
  • Dan Klein
چکیده

We investigate the empirical behavior of ngram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain, discounts are roughly constant, matching the assumptions of modified Kneser-Ney LMs. However, when training and test corpora diverge, the empirical discount grows essentially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts, resulting in perplexity improvements over modified Kneser-Ney and Jelinek-Mercer baselines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nominalization in Academic Writing: A Cross-disciplinary Investigation of Physics and Applied Linguistics Empirical Research Articles

The present study aimed to explore how nominalization is manifested in a sample of Physics and Applied Linguistics research articles (RAs), representing hard and soft sciences respectively. To this end, 60 RAs from discipline-related professional journals were randomly selected and analyzed in light of Halliday and Matthiessen’s (1999) taxonomy of nominalization. Comparing the normalized freque...

متن کامل

Investigation and Statistical comparison of the soil empirical desalinization models for salin-sodic soils (Case study: Khuzestan province)

Accumulation of soluble salts in arid areas which are similar to most regions of Iran is inevitable in soil surface and profile because of low precipitation and high evaporation. High concentration of soluble salts in soil profile caused severe problems for root water uptake thus plant growth stopped. Reducing soil salinity to optimized content by leaching and avoiding soil pounding must be con...

متن کامل

Domain mining for machine translation

Massive amounts of data for data mining consist of natural language data. A challenge in natural language is to translate the data into a particular language. Machine translation can do the translation automatically. However, the models trained on data from a domain tend to perform poorly for different domains. One way to resolve this issue is to train domain adaptation translation and language...

متن کامل

Hysteresis: Phenomenon and Modeling in Soil- Water Relationship

Hysteresis has been widely recognized in the soil water relationship. In this paper, a detailed review of hysteresis was performed in relation to its models. So far, different models have been suggested to describe hysteresis in the water retention curve (WRC) that could be categorized into two main groups: conceptual and empirical models. The models in the first group are based on the domain t...

متن کامل

The Investigation of the Perspectives of Iranian EFL Domain Experts on Postmethod Pedagogy: A Delphi Technique

After the introduction of postmethod pedagogy by Kumaravadivelu with its three principles of particularity, possibility and practicality, a wave of attention was directed towards this so-called 'postmethod era' and its appropriacy and adequacy in satiating the demands of the language learners in this 'brand new world'. This situation has created a healthy debate among the Iranian EFL community ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011